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Abstract —Sparsification reduces the size of networks 
while preserving structural and statistical properties of in¬ 
terest. Various sparslfying algorithms have been proposed 
in different contexts. We contribute the first systematic 
conceptual and experimental comparison of edge sparsifi¬ 
cation methods on a diverse set of network properties. It is 
shown that they can be understood as methods for rating 
edges by importance and then filtering globally by these 
scores. In addition, we propose a new sparsification method 
(Local Degree) which preserves edges leading to local hub 
nodes. All methods are evaluated on a set of 100 Facebook 
social networks with respect to network properties includ¬ 
ing diameter, connected components, community structure, 
and multiple node centrality measures. Experiments with 
our implementations of the sparsification methods (using 
the open-source network analysis tool suite NetworKit) 
show that many network properties can be preserved down 
to about 20% of the original set of edges. Furthermore, the 
experimental results allow us to differentiate the behavior 
of different methods and show which method is suitable 
with respect to which property. Our Local Degree method 
is fast enough for large-scale networks and performs 
well across a wider range of properties than previously 
proposed methods. 

Keywords: complex networks, sparsification, backbones, 
network reduction, edge sampling 

I. Introduction 

A. Context 

Complex networks have nontrivial structures and 
statistical properties and are often represented by graphs. 
Such data models have been employed in countless 
domains based on the observation that the structure of 
relationships yields insights into the composition and 
behavior of complex systems U]. Many concepts were 
pioneered in the study of social networks, in which 
edges represent social ties between social actors. Most 
real-world complex networks, including social networks, 
are already sparse in the sense that for n nodes the 
edge count m is asymptotically in 0{n). Nonetheless, 
typical densities lead to a computationally challenging 
number of edges. Here we pursue the goal of further 
sparslfying such networks by retaining just a fraction of 
edges (sometimes called a “backbone” of the network). 


while showing experimentally that important properties 
of networks can be preserved in the process. 

Potential applications of network sparsification are 
numerous. One of them is information visualization; 
Even moderately sized networks turn into “hairballs” 
when drawn with standard techniques, as the amount 
of edges is visually overwhelming. In contrast, showing 
only a fraction of edges can reveal network structures to 
the human eye if these edges are selected appropriately. 
Sparsification can also be applied as an acceleration 
technique; By disregarding a large fraction of edges that 
are unimportant for the task, running times of graph 
and network analysis algorithms can be reduced. From 
a network science perspective, sparsification can yield 
valuable insights into the importance of relationships 
and the participating nodes; Given that a sparsification 
method tends to preserve a certain property, the method 
can be used to rank or classify edges, discriminating 
between essential and redundant edges. Many other 
possible applications arise if we think of sparsification 
as lossy compression. Large networks can be strongly 
reduced in size if we are only interested in certain 
structural aspects that are preserved by the sparsification 
method. 

The core idea of the research presented here is that 
not all edges are equally important with respect to 
properties of a network; For example, a relatively small 
fraction of long-range edges typically act as shortcuts 
and are responsible for the small-world phenomenon 
in complex networks. The importance of edges can be 
quantified, leading to edge scores, often also referred to 
as edge centrality values. In general, we subsume under 
these terms any measure that quantifies the importance 
of an edge depending on its position within the network 
structure. Sparsification can then be broken down into 
the stages of (i) edge scoring and (ii) filtering the edges 
using a global score threshold. 

Despite the similar terminology, our work is only 
weakly related to a line of research in theoretical com¬ 
puter science where graph sparsification is understood 
as the reduction of a dense graph (0(n^) edges) to a 
sparse graph (0(n) edges) while provably preserving 
properties such as spectral properties (e. g. 0). The 


networks of our interest are already sparse in this sense. 
With the goal of reducing network data size while 
keeping important properties, our research is related to a 
body of work that considers sampling from networks (on 
which El provides an extensive overview). Sampling 
is concerned with the design of algorithms that select 
edges and/or nodes from a network. Here, node and 
edge sampling methods must be distinguished: For node 
sampling, nodes and edges from the original network 
are discarded, while edge sampling preserves all nodes 
and reduces the number of edges only. The literature on 
node sampling is extensive, while pure edge sampling 
and filtering techniques have not been considered as 
often. A seminal paper Bl concludes that node sam¬ 
pling techniques are preferable, but considers few edge 
sampling techniques. The study presented in Q looks at 
how well a sample of 5%-20% of the original network 
preserves certain properties, and is mainly focused on 
node sampling through graph exploration. It concludes 
that random walk-based node sampling works best on 
complex networks, but does so on the basis of experi¬ 
ments on synthetic graphs only and compares only with 
very simple edge sampling methods. 

Only edge sampling techniques are directly compa¬ 
rable to our edge scoring and filtering methods. In this 
work, we restrict ourselves to reducing the edge set, 
while keeping all nodes of the original graph. Preserving 
the nodes allows us to infer properties of each node of 
the original graph. This is important because in network 
analysis, the unit of analysis is often the individual node, 
e. g. when a score for each user in an online social 
network scenario shall be computed. With respect to the 
goal of accelerating the analysis, many relevant graph 
algorithms scale with m rather than n, so reducing m 
is more relevant. 

Another related approach is the Multiscale Back¬ 
bone a, which is applicable on weighted graphs only 
and is therefore not included in our study. Instead of 
applying a global edge weight cutoff for edge filtering, 
which hides important structures at different scales, this 
approach aims at preserving them at all scales. 

B. Contribution 

We contribute the first systematic conceptual and 
experimental comparison of existing and novel edge 
scoring and filtering methods on a diverse set of network 
properties. Descriptions and literature references for the 
related methods which we reimplemented are given in 
Section nni Our results illuminate which methods are 
suitable with respect to which properties of a network. In 
particular, the Local Degree method we propose is based 
on simple principles but surprisingly effective across 
a wider range of properties than previously proposed 
methods. Furthermore, upon acceptance, we publish 
efficient parallelized implementations and a framework 
for such methods as part of the NetworKit open-source 
tool suite 0. While our study covers various approaches 


from the literature, it is by no means exhaustive due to 
the vast amount of potential sparsification techniques. 
With future methods in mind, we hope to contribute a 
framework for their implementation and evaluation. 

II. Network Properties 

The structure of a complex network is usually char¬ 
acterized in terms of certain key figures and statistics ||8l . 
Decomposition of the network into cohesive regions 
is a frequent analysis task: All nodes in a connected 
component are reachable from each other. Communities 
are subsets of nodes that are internally dense and ex¬ 
ternally sparsely connected. The diameter of a graph is 
the length of its longest shortest path. The observation 
that the diameter of social networks is often surprisingly 
small is referred to as the small world phenomenon. In 
case of disconnected graphs, we consider the diameter 
of the largest component. 

Node centrality measures quantify the relative im¬ 
portance of a node within the network structure. The 
distribution of degrees, the number of connections per 
node, plays an important role in characterizing a net¬ 
work: Empirically observed complex networks tend to 
show a heavy tailed degree distribution which follows a 
power-law with a characteristic exponent: p{k) ~ k~^. 
Clustering coefficients are key figures for the amount 
of transitivity in networks, i. e. the tendency of edges 
to form between indirect neighbor nodes. Betweenness 
centrality expresses the concept that a node is important 
if it lies on many shortest paths between nodes in 
the network. PageRank assigns relative importance to 
nodes according to their connections, incorporating the 
idea that edges to high-scoring nodes contribute more. 
While this collection is not and cannot be exhaustive, 
we choose these common measures for our experimental 
study (Section [V]i. 

III. Sparsieication Methods 

All sparsification methods we consider can be split 
up into two stages: (i) the calculation of a score for each 
of the m edges in the input graph (where the score is 
high if the edge is important) and (ii) subsequent global 
filtering according to these scores. In this section we 
present the existing and new approaches we consider 
and show for each of these methods how it can be 
transformed into an edge score that can be used for 
global filtering. 

Random Edge (RE): When studying different 
sparsification algorithms, the performance of random 
edge selection is an important baseline. As we shall 
see, it also performs surprisingly well. The method 
selects edges uniformly at random from the original 
set such that the desired sparsification ratio is obtained. 
This is equivalent to scoring edges with values chosen 
uniformly at random. Naturally this needs time linear in 
the number of edges. 
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Triangles: Especially in social networks, trian¬ 
gles play an important role because the presence of a 
triangle indicates a certain quality of the relationship be¬ 
tween the three involved nodes. The sociological theory 
of Simmel Q states that “triads (sets of three actors) are 
fundamentally different from dyads (sets of two actors) 
by way of introducing mediating effects.” In a friendship 
network, it is likely for two actors with a high number of 
common friends to be friends as well. Filtering globally 
by triangle counts tends to destroy local structures, but 
several of the following sparsification methods are based 
on the triangles edge score T{u,v) that denotes for an 
edge {UjU} the number of triangles it belongs to. The 
time needed for counting the number of all triangles is 
0{m ■ a) flOl . where a is the graph’s arboricity El. 

Local Similarity (LS): One line of research at¬ 
tempts to sparsify graphs with the goal of speeding 
up data mining algorithms. Satuluri et al. ini propose 
a local graph sparsification method with the intention 
of speedup and quality improvement of community 
detection. They suggest reducing the edge set to 10- 
20% of the original graph and use the Jaccard measure to 
quantify the overlap between node neighborhoods N{u), 
N{v) and thereby the similarity of two given nodes: 


^ \N{u)nN{v)\ ^ T{u,v) 

|A^(m) U A^(u)| d{u) + d{v) — T{u,v)' 

where d{u) denotes the degree of u. Global sparsifica¬ 
tion approaches tend to destroy small network structures 
that are relevant from a local point of view. In order to 
achieve local instead of global sparsification, Satuluri et 
al. keep for each node u the top edges incident 

to u, ranked according to their similarity (a G [0,1]). 
Note that this procedure ensures that at least one incident 
edge of each node is retained. This is equivalent to 
assigning each edge the score 1 — a for the minimum 
value of a such that the edge is kept in the sparsified 
graph and filtering by this edge score. The time needed 
for calculating this edge score is the time for counting 
all triangles and for sorting the neighbors of all nodes, 
which can be done in 0(m log(dinax))- The authors also 
propose a fast approximation which runs in time 0{m). 
This sparsification technique has also been adapted for 
accelerating collective classification, i. e. the task of 
inferring the labels of all nodes in a graph given a subset 
of labeled nodes ca. 

Simmelian Backbones (TS, QLS): The Simmelian 
Backbones introduced by Nick et al. in aim at dis¬ 
criminating between edges that are placed within dense 
subgraphs and those between them. The original goal 
of these methods was to produce readable layouts of 
networks. To achieve a “local assessment of the level 
of actor neighborhoods” m, the authors propose the 
following approach, which we adapt to our concept of 
edge scores. Given an edge scoring method S and a 
node u, they introduce the notion of a rank-ordered 


neighborhood as the list of adjacent neighbors sorted 
by S{u,-) in descending order. The original (Triadic) 
Simmelian Backbone uses triangle counts T for S. The 
newer Quadrilateral Simmelian Backbone by Nocaj et 
al. Qa uses quadrilateral edge embeddedness, which 
they define as 


Q{u,v) 


q{u,v) 

sjqiu) ■ q(v) 


with q(u, v) being the number of quadrangles containing 
edge {rt, w} and q{u) being the sum of q{u,v) over all 
neighbors v of u. They argue that this modified version 
performs even better at discriminating edges within and 
between dense subgraphs. 


On top of the rank-ordered neighborhood graph 
that is induced by the ranked neighborhoods of all 
nodes, Nick et al. introduce two filtering techniques, a 
parametric one and a non-parametric one. Like Nocaj 
et al. we use only the non-parametric variant. By TS, 
we denote the Triadic Simmelian Backbone and by 
QLS the Quadrilateral Simmelian Backbone. The non- 
parametric variant uses the Jaccard measure similar to 
Local Similarity but, instead of considering the whole 
neighborhood, they use the maximum of the Jaccard 
measure of the top-fc neighborhoods for all possible 
values of k. While the time needed for quadrangle 
counting is equal to the time for triangle counting mi, 
the overlap and Jaccard measure calculation of prefixes 
needs time 0{m ■ dmax log((imax)) as it needs to be 
separately calculated for all edges. 


Edge Forest Fire (EFF): The original Forest Fire 
node sampling algorithm Q is based on the idea that 
nodes are “burned” during a fire that starts at a random 
node and may spread to the neighbors of a burning node. 
Note that contrary to random walks the fire can spread 
to more than one neighbor but already burned neighbors 
cannot be burned again. The basic intuition is that nodes 
and edges that get visited more frequently than others 
during these walks are more important. In order to filter 
edges instead of nodes, we introduce a variant of the 
algorithm in which we use the frequency of visits of 
each edge as a proxy for its relevance. As the total length 
of all walks is hard to estimate in advance, we cannot 
give a tight bound for the running time. 


Local Degree (LD): Inspired by the notion of 
hub nodes, i. e. nodes with locally relatively high degree, 
as well as the approach of Satuluri et al. ca with their 
Local Similarity method, we propose the following new 
sparsification method: For each node v G V, we include 
the edges to the top [deg(i;)“J neighbors, sorted by 
degree in descending order. Similar to Local Similarity 
we use again 1 — a for the minimum parameter a such 
that an edge is still contained in the sparsified graph as 
edge score. The goal of this approach is to keep those 
edges in the sparsified graph that lead to nodes with 
high degree, i.e. the hubs that are crucial for a complex 
network’s topology. The edges left after filtering form 
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what can be considered a “hub backbone” of the network 
(see Fig. [^for an example). 

As only the neighbors of each node need to be 
sorted, this can be done in 0 (to log(c(max))- Using 
linear-time sorting it is even possible in 0{m) time. 



Figure 1: Drawing of the Jazz musicians collaboration 
network and the Local Degree sparsified version con¬ 
taining 15% of edges. Node size proportional to degree. 


IV. Implementation 

For this study, we created efficient C-H- implementa¬ 
tions of all considered sparsification methods, and accel¬ 
erated them using OpenMP parallelization. In particular, 
RE, LD, LS and the Simmelian Backbone methods 
(with exception of the inherently sequential triangle 
and quadrangle counting algorithms El) have been 
parallelized. We implemented the algorithms in Net- 
worKit Q, an interactive tool suite for scalable network 
analysis. It provides a large set of graph algorithm im¬ 
plementations we used for our experiments. NetworKit 
combines kernels written in C-n- with an interactive 
Python shell to achieve both high performance and 
interactivity, a concept we use for our implementations 
as well. For community detection, we use an efficient 
implementation of the Louvain method that is also 
part of NetworKit ca. To get consistent results, a 
deterministic configuration of this algorithm is used. 

Gephi uni is a graph visualization tool which we 
use not only for visualization purposes but also for 
interactive exploration of sparsified graphs. To achieve 
said interactivity, we implemented a client for the Gephi 
Streaming Plugin in NetworKit. It is designed to stream 
graph objects from and to Gephi utilizing the JSON 
format. Using our implementation in NetworKit, a few 
lines of Python code suffice to sparsify a graph, calculate 
various network properties, and export it to Gephi for 
drawing. The approach of separating sparsification into 
edge score calculation and filtering allows for a high 
level of interactivity by exporting edge scores from 
NetworKit to Gephi and dynamic filtering within Gephi. 


V. Experimental Study 
A. Quantifying Similarity in Network Properties 

Quantifying the similarity between a network and its 
sparsified version is an intricate problem. Ideally, a sim¬ 
ilarity measure should meet the following requirements: 
a) Ignoring trivial differences: Consider, for example, 
the degree distribution: One cannot expect the distribu¬ 
tion to remain identical after edges get removed during 
sparsification. It is clear, however, that the general shape 
of the distribution should remain “similar” and that high- 
degree nodes should remain high-degree nodes in order 
to consider the degrees as preserved, b) Intuitive and 
Normalized: Similarity values from a closed domain 
like [0,1] allow for aggregation and comparability. A 
similarity value of 1 indicates that the property under 
consideration is fully preserved, whereas a value of 
0 indicates that similarity is entirely lost, c) Reveal¬ 
ing Method Behavior: A good similarity measure will 
clearly expose different behavior between sparsification 
methods, d) Efficiently computable. 

Following these requirements, we select the follow¬ 
ing measures: In order to observe how the network diam¬ 
eter changes through sparsification (Sec. |V-D| l, we plot 
the quotient of the original network diameter and the 
resulting diameter, which yields legible results since in 
practice the diameter does not decrease during sparsifi¬ 
cation. Both the detection of connected components and 
communities yield partitions of the node set into disjoint 
subsets. We use Normalized Mutual Information (NMI) 
as a similarity value, a common measure for comparing 
partitions of graphs Cl. Node degree, betweenness 
and PageRank can be treated as node centrality indices 
which represent a ranking of nodes by structural im¬ 
portance. Since absolute values of the centrality scores 
are less interesting than the resulting rank order, we 
compare the rankings before and after sparsification 
using Spearman’s p rank correlation coefficient. (This 
focus on rank order is also the reason why we did 
not adopt the Kolmogorov-Smimov statistic used in 0, 
which compares distributions of absolute values.) Even 
though the local clustering coefficient can be interpreted 
as a centrality score as well, the comparison of ranks 
does not seem meaningful in this case due to the fact that 
it is a local score. Instead, we analyse the deviation of 
the average local clustering coefficient from the original 
value. 


B. Setup 

Experiments were performed on a multicore com¬ 
pute server with 4 physical Intel Core i7 cores at 
3.4 GHz, 8 threads, and 32 GB of memory. For this 
explorative study, we use a collection of 100 social 
networks representing early snapshots of Facebook, each 
of which is a student online friendship network for a US 
university US). Sizes of the networks are between 10k 
and 1.6 million edges. For the plots in Sec. V-D| we 
aggregate experimental results over this set. We chose 
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to focus on a set of networks of one type, i.e. with a 
common origin and high structural similarity among the 
networks, in order to get meaningful aggregated values. 
It remains an open question to what extent results can 
be translated to other types of complex networks, since 
according to experience the performance of network 
analysis algorithms depends strongly on the network 
structure. 

C. Correlations between Edge Scores 

Among our sparsification methods, some are more 
similar to others in the sense that they tend to preserve 
similar edges. Such similarities can be clarified by 
studying correlations between edge scores. We calcu¬ 
late edge score correlations for the set of 100 social 
networks as follows; For each single network, edge 
scores are calculated with the various scoring methods 
and Spearman’s rank correlation coefficient is applied. 
The coefficient is then averaged over all networks and 
plotted in the correlation matrix (Figure [^. There is 
one column for each method, and the column Mod 
represents edge scores that are 1 for intra-community 
edges and 0 for inter-community edges after running a 
modularity-maximizing community detection algorithm. 
Positive correlations with these scores indicate that the 
respective rating method assigns high scores to edges 
within modularity-based communities. 
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Figure 2; Edge score correlations (Spearman’s p) 


within dense subgraphs (LS, QLS, TS) are clearly 
positively correlated, and also have positive correlations 
with modularity-based communities and the number of 
triangles an edge is embedded in (Tri). Interestingly, a 
strong positive correlation exists between Local Similar¬ 
ity and Quadrilateral Simmelian Backbone, which have 
different computational costs, but are predicted to show 
similar sparsifying effects. The two new methods we 
introduce. Edge Eorest Eire and Local Degree, set them¬ 
selves apart from this class, while also being negatively 
correlated with each other. ED tends to favor edges 
between dense regions, unlike the LS and Simmelian 
methods. The strong negative correlation between EEL 
and triangle count can be explained by the fact that the 
Edge Eorest Eire can never “burn” a triangle, as nodes 
cannot be visited twice. 


D. Preservation of Properties 

In the following plots, the measures discussed in 
Sec. V-A are shown on the y-axis for a given ratio of 
kept edges (m'/m) on the x-axis (e.g., a ratio of 0.2 
means that 20% of edges are still present). 



ratio of kept edges 

(a) Original network diameter divided by network diameter 
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(b) Deviation from original clustering coefficient 

Eigure 3; Preservation of global network properties 


Interpretation of the results is challenging; The 
correlations we observe reflect intrinsic, mathematical 
similarities of the rating algorithms on the one hand, 
but on the other hand they are also caused by the 
structure of this specific set of social networks (e.g., 
it may be a characteristic of a given network that 
edges leading to high-degree nodes are also embedded 
in many triangles). Nonetheless, we note the following 
observations; The methods intended to preserve edges 


Diameter: We motivated the Local Degree 
method with the idea that shortest paths commonly 
run through hub nodes in social networks. Therefore, 
preserving edges leading to high-degree nodes should 
preserve the small diameter. This is confirmed by our 
experiments (Eigure |^. In contrast, methods that prefer 
edges within dense regions clearly do not preserve the 
diameter. With Simmelian Backbones the diameter drops 
when only few edges are left; this can be explained by 
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the fact that Simmelian Backbones do not maintain the 
connectivity and that at the end the graph is decom¬ 
posed into multiple connected components which have 
a smaller diameter. 

Clustering Coefficient: As far as the average 
local clustering coefficient is concerned, we observe 
three classes of sparsification methods. Figure [Jb| shows 
the deviation from the original value, averaged over all 
graphs in our dataset. For both RE and EFF, which are 
based on randomness, the clustering coefficient drops 
almost linearly with decreasing sparsification ratio. TS, 
QLS and LS keep mostly edges within dense regions, 
which results in increasing clustering coefficients. Note 
that our method LD keeps the deviation close to zero 
for sample sizes down to 10%. 

Node Centrality Measures: The similarity of 
curves in Figure catches the eye immediately: For 
these node centrality measures, the sparsification meth¬ 
ods behave in a very similar way, with random edge 
deletion and Local Degree performing best and Edge 
Eorest Eire failing early. This similarity could be ex¬ 
plained by strong correlations between node degree, 
PageRank and betweenness, which have been observed 
before (e.g. iQl). Likewise, EEE fails because it can¬ 
not preserve node degrees, as the expected number 
of randomly selected incident edges via the “burning 
process” is relatively low even for high-degree nodes. In 
accordance with our intuition that edges leading to high- 
degree neighbors are important and should be preserved, 
our experiments show that the Local Degree method pre¬ 
serves multiple node centralities (i.e. degree, PageRank 
and betweenness). Random Edge filtering is also quite 
good at preserving betweenness, PageRank and degree 
centrality. Again, methods that are focused on keeping 
edges within dense regions are not as good at preserving 
said properties. However, filtering locally seems to help 
the Local Similarity sparsification technique to perform 
still better than Simmelian Backbones. 


Components and Communities: As shown in 
Eigure our Local Degree method best preserves 
the connected components of the graph. Random edge 
filtering and Simmelian Backbones on the contrary tend 
to separate nodes of degree 1 from the rest of the 
graph. Local Similarity also preserves connectivity but 
as its retained edges are not directed towards central 
hubs, it easily disconnects small groups of nodes. Ac¬ 
cording to the NMI measure in Eigure 5a it seems 


that random edge sampling is best suited for preserving 
the community structure as it is found by the Louvain 
method. However, if we consider the number of com¬ 
munities in Eigure the results are quite different. 
The Simmelian Backbones generate singletons rather 
quickly. This explains why the number of communities 
increases so quickly. Random edge filtering leads to 
the same phenomenon. Nevertheless, the communities 
found seem to differ significantly for all sparsification 
methods. The Local Degree sparsification method is the 
only method we consider that is able to keep the number 




ratio of kept edges 


(b) Spearman’s rank correlation coefficient for betweenness centrality 



ratio of kept edges 


(c) Spearman’s rank correlation coefficient for PageRank centrality 

Eigure 4: Preservation of node centrality measures 


of communities relatively unchanged up to a very high 
degree of sparsification. The nonetheless rather low NMI 
similarity values can be explained by the following 
behavior: Consider a hub node x within a community 
with neighbors that are for the most part also connected 
to a hub node y with higher degree than x. Due to the 
way Local Degree scores edges, x will lose many of its 
connections within the community and may be pulled 
into the community of a neighboring high-degree node 
z that is not part of the original community of x. As 
most real-world networks do not have one community 
structure but many, it has to be left as an open question 
if those sparsification techniques that keep the number 
of communities within a reasonable range do not simply 
find different communities. 

E. Running Time 

Measured running times are shown in Eig. Apart 
from Random Edge sparsification, our Local Degree 
method is clearly the fastest method and scales linearly 
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ratio of kept edges 

(a) Normalized Mutual Information between partitions into communities 



ratio of kept edges 

(b) Number of communities in original graph divided by number 
of communities in sparsified graph (according to PLM community 
detection, average over all graphs) 


1.2 
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(c) NMI of connected components 

Figure 5: Preservation of network cohesion 


with the number of edges, which makes it well suited 
for large-scale networks in the range of millions to 
billions of edges. LS is also fast and could be further 
accelerated using inexact Jaccard coefficient calculation. 
Both Simmelian methods are significantly slower than 
the other methods, but still efficient enough for the 
network sizes we consider. While the time complexity in 
O-notation of EFF is difficult to assess, it is only slightly 
faster than the Simmelian methods and not among the 
fastest methods. 

VI. Conclusion 

Our experimental study shows that several methods 
are capable of preserving a set of relevant properties of 
social networks when up to 80% of edges have been 
removed. Random edge deletion performs surprisingly 
well and retains a wide range of properties, but more 



Edges xlO® 

Figure 6; Running times of various edge scoring meth¬ 
ods on our graph dataset. The y-axis is linearly scaled 
below 1 and logarithmically above. 


targeted methods can perform even better; Our novel 
Local Degree (LD) method seems to be suited for 
preserving node centralities like degree, PageRank and 
betweenness. Also, the clustering, connectedness and the 
typically small diameter of complex networks are signif¬ 
icantly better preserved than through random deletion. 
This supports the initial motivation of LD, namely that 
connections to hubs are highly important for a network’s 
structure. Only the community structure seems to be 
discarded by LD, where RE actually performs best 
of all methods considered. Lurthermore, LD is only 
slightly more computationally expensive than random 
edge selection, and therefore applicable to very large 
networks. The LS method has been developed to support 
community detection, and we confirm its suitability 
for this purpose. However, network diameter and node 
centralities tend to get distorted. Our adaptation of the 
Lorest Lire sampling algorithm to edge scoring fails at 
preserving node centralities, but is the second best at 
keeping the network diameter. 

We hope that the conceptual framework of edge 
scoring and filtering as well as our evaluation methods 
are steps towards a more unified perspective on a 
variety of related methods that have been proposed in 
different contexts. Luture developments can be easily 
carried out within this framework and based on our 
implementations, which will be available as part of 
a future release of the open-source network analysis 
package NetworKij^ 


* https://networkit.iti.kit.edu/ 
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